A Two-Stage Random Forest-Based Pathway Analysis Method
نویسندگان
چکیده
Pathway analysis provides a powerful approach for identifying the joint effect of genes grouped into biologically-based pathways on disease. Pathway analysis is also an attractive approach for a secondary analysis of genome-wide association study (GWAS) data that may still yield new results from these valuable datasets. Most of the current pathway analysis methods focused on testing the cumulative main effects of genes in a pathway. However, for complex diseases, gene-gene interactions are expected to play a critical role in disease etiology. We extended a random forest-based method for pathway analysis by incorporating a two-stage design. We used simulations to verify that the proposed method has the correct type I error rates. We also used simulations to show that the method is more powerful than the original random forest-based pathway approach and the set-based test implemented in PLINK in the presence of gene-gene interactions. Finally, we applied the method to a breast cancer GWAS dataset and a lung cancer GWAS dataset and interesting pathways were identified that have implications for breast and lung cancers.
منابع مشابه
Comparison of Random Forest and Logistic Regression Methods in Predicting Mortality in Colorectal Cancer Patients and its Related Factors
Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors on the mortality of patients with colorectal cancer using random forest and logistic regression methods. Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shah...
متن کاملThree-stage inversion improvement for forest height estimation using dual-PolInSAR data
This paper addresses an algorithm for forest height estimation using single frequency single baseline dual polarization radar interferometry data. The proposed method is based on a physical two layer volume over ground model and is represented using polarimetric synthetic aperture radar interferometry (PolInSAR) technique. The presented algorithm provides the opportunity to take advantages of t...
متن کاملEvaluation of risk factors of recurrence of hodgkin\'s lymphoma using random survival forest and comparison with cox regression model
Background: In many studies, Cox regression was used to assess the important factors that affect the survival of cancer patients based on demographic and clinical variables. The aim of this study was to determine the factors affecting the survival of patients with Hodgkin's lymphoma using the random survival forest (RSF) method and compare it with the Cox model. Methods: In this retrospective ...
متن کاملScheduling and Stochastic Capacity Estimation of an EV Charging Station with PV Rooftop Using Queuing Theory and Random Forest
Power capacity of EV charging stations could be increased by installing PV arrays on their rooftops. In these charging stations, power transmission can be two-sided when needed. In this paper a new method based on queuing theory and random forest algorithm proposed to calculate net power of charging station considering random SOC of EV’s. Due to estimation time constraints, a queuing model with...
متن کامل3D Detection of Power-Transmission Lines in Point Clouds Using Random Forest Method
Inspection of power transmission lines using classic experts based methods suffers from disadvantages such as highel level of time and money consumption. Advent of UAVs and their application in aerial data gathering help to decrease the time and cost promenantly. The purpose of this research is to present an efficient automated method for inspection of power transmission lines based on point c...
متن کامل